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1. Introduction 



This paper proposes an application of the proximal minimization algorithm for 
the following problem. 

D ■. £(u") = min £(u) 

u^U 

where C{u) is convex and t/ is a compact subset of R^' . In particular, U is assumed 
to be a polyhedral of the form {u : Au < b and u € i?'”}, where A \s & p x m 
and fc is a vector in R^. To simplify the presentation and motivate applications 
to Lagrangian duality and variational inequalities, the objective function C{u) is 
also assumed to have the following form: 

= max{/(x) + n • 5 t(x)} (1) 

xCA 

where X is a compact subset of f{x) is a real-valued function on /?", and 
g{x) is a vector- valued function mapping to R"^. The notation a ■ b denotes 
the usual dot product between two vectors, a and b. 

When U is taken to be the (noncompact) set {u ; u > 0 and u € D is 

simply the Lagrangian dual problem of the following nonlinear program: 

P : f{^‘) = max f{x) 

s.t. g{x) > 0 

X 6 AL 

Under an additional assumption that there e.xists an x such that g{x) > 0, the 
solution to P can be obtained by solving D with U = {u : 0 < u < M and u € 
R^}, where A/ is sufficiently large. 

On the other hand, when f{x) = F{x) ■ x and g{x) = F(x), where F{x) is a 
continuous mapping from R^ into itself and satisfies, for some o > 0, 

{F{u) — F{x)) • {u — x) > a||n — x||^, Vu,x € U, 



then 



£(u) = max{— F(xJ • (x — u)} 

x^U 
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and D becomes 



min max{ — F( 2 :) • (x — u)} = — max min{F(x) • (x — u)}. 

Hearn et al. (1984) referred to the problem on the right cts the dual of the formu- 
lation based on the gap function for the following variational inequality: 

Find u" ^ U such that F{u*) • (x — u*) >0, W x £ U. 

For the remainder, it is convenient to simply refer to C{u) as the dual function. 

To solve Z), the proximal minimization algorithm (see, e.g., Bertsekas and 
Tsitsiklis, 1990, Martinet, 1970, and Rockafellar, 1976) generates a sequence of 
points in U by the iteration 




where is a starting point, || • || denotes the Euclidean norm and c;. is a sequence 
of positive numbers with 

lim inf C}^ > 0. 

k—*oo 

Although the above iterative process converges to an optimal solution of D, there 
is a concern regarding its practicality. Bertsekas and Tsitsiklis (1990) pointed out 
in their book that the proximal minimization algorithm requires solutions to a 
sequence of problems instead of just one problem. When C{ii) is nondifferentiable, 
this concern is more acute. Adding the term 2 ^||u — only makes 

the objective function of the problem in (2) strictly convex. So, when C{ii) is 
nondifferentiable, the objective function in (2) is still nondifferentiable and solving 
a sequence of nondifferentiable, but strictly conve.x, does not appear as attractive 
as solving only one nondifferentiable problem that may not be strictly convex. 

To make proximal minimization more amenable to D, this paper approximates 
C{u) in (2) by the following function; 

C{u) s; L^{u) = max {f{x') + u • g(x')} 

1 = 1 , 

where x' 6 A^. When x' is chosen appropriately, L^(u) is simple a maximum of a 
finite number of hyperplanes tangential to C(u). These hyperplanes are generally 
known as cuts or cutting planes. 
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To unify the above scheme with other algorithms that use cutting planes, this 
paper describes in the next section a generic algorithm which combines cutting 
planes with proximal minimization. From this generic algorithm, three algorithms 
from the literature can be derived; they are the cutting plane algorithm, the 
cutting plane algorithm with line search and the family of bundle methods. Among 
these algorithms, the bundle methods can be viewed as a quadratic counterpart of 
the cutting plane algorithm with line search or vice versa, i.e., the latter is a linear 
counterpart for the former. This prompts the question of whether there exists a 
quadratic counterpart for the (plain) cutting plane algorithm. The results in this 
paper provide an affirmative response to the question. 

For the remaining. Section 2 formally states the generic algorithm and derives 
from it the three algorithms in the literature. Also derived is the new algorithm 
which is a quadratic counterpart of the cutting plane algorithm. Section 3 pro- 
vides convergent results for the new algorithm. To establish a closer relationship 
between proximal minimization and bundle methods. Section 4 provides a conver- 
gence proof for a simple version of the latter which is different from those in the 
literature and uses analysis common to proximal minimization. Finally, Section 5 
concludes the paper. 
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2. Classification of Algorithms 



To classify and establish relationships among algorithms, we first state a generic 
algorithm and then show how it can be specialized to the four algorithms. Three 
of the four exist in the literature and the last is new and showm to be a quadratic 
counterpart of the cutting plane algorithm. 

A GENERIC ALGORITHM 

Step 1: Select € U. Set k = l.v^ = and 

= argmax{/(x) + • i'fa:)}. 

xC-Y 

Step 2; Solve the master problem 

= argmin{L'^(u) + 

zck 

If also solves the problem, stop and solves D, 

Step 3: Solve the subproblem 

= argmax{/(x) + . ,?(^)} 

Note that • ^(x*^'''^). 

Step 4: Derive G U from and using some process and/or criteria 
(see discussion below). Set k = k + \ and return to Step 1. 



Note that the (master) problem in Step 2 is slightly different from the one in 
equation (2) of the previous section. The ‘prox-center’ in the proximal term is 
for the master problem and it is for the problem in (2). In addition, the master 
problem in Step 2 can be stated as 



MP : 



min w + ||u — 

2ck 

s.t. 

w > /(^') + ^ ■ i = l,...,k 

.4u < b 
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where the first k constraints are generally referred to as cuts or cutting planes. 



The dual of MP 


can be written as 








MD : 


max-^||G;r-|-A‘A|p 


+ 


{G^v^ + /) • 


■ 7T -f {Av^ — 6) • A 




k 

s.t. Ti 

t = l 


= 


1 






TT,- 


> 


0, i = l,. 


..,k 




A. 


> 


0 i = i,. 


. .,p 


where / denotes 


a vector in with /( 


P) 


as its components, G denotes a. m x k 



matrix with g{x') as its columns, tt, are the dual variables corresponding to the 
cutting plane constraints and Aj are dual variables corresponding to the con- 
straints defined by the matrix /f. In any case, both the master problem and its 
dual can be solved in a finite number of iterations. Pang (19S3) and Lin and 
Pang (1987) reviewed a large number of algorithms applicable to both MP and 
MD. More specifically, Kiwiel (1991) designed a dual algorithm to solve M P and 
Bertsekas (1982) proposed an efficient algorithm designed especially for convex 
programming problems with simple constraints such as those in M D. 

Below, we describe four specializations of the generic algorithm. They are 
the cutting plane algorithm, the cutting plane algorithm with line searches, the 
bundle methods and a new algorithm called the proximal minimization algorithm 
with cutting planes. 

The cutting plane (CP) algorithm: The generic algorithm reduces to the 
CP algorithm when Cf. = oo and V k. First, setting ci, = cc makes 

the proximal term vanishes from the objective function of the master problem in 
Step 2, thereby reducing it to the following linear program: 

ML : min w 

s.t. 

^ /(^') + ’^ • 5(^') f = l,...,fc 
.4u < b 

Without the proximal term and always setting , the variable v be- 

comes superfluous and can be eliminated from the algorithm entirely. This reduces 
Step 4 to simply increment k by one. It can be shown that the stopping rule in 
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Step 2 of the generic algorithm is equivalent to the one typical for the CP algo- 
rithm which is to stop when in Step 3. 

The CP algorithm Wcls first introduced by Cheney and Goldstein (1959) and 
Kelly (1960). Dantzig and Wolfe (1960) developed a related algorithm called the 
column generation technique in the context of decomposing large scale linear pro- 
grams. Column generation was later generalized to solve Lagrangian dual prob- 
lems for mathematical programs (see, Dantzig, 1963 and Magnanti et ah, 1976) 
and was given the name generalized linear programming technique. Regardless of 
the terminology, it is well known (see, e.g., Dantzig, 1963, Magnanti et ah, 1976 
and Zangwill, 1969) that the convergence of the CP algorithm follows from the 
monotonicity of the sequence or {I*'“'(u^)}. However, the corresponding 

sequence of dual function values, {£(u^')} is not necessarily monotonic. Therefore, 
the CP algorithm is a variant of the generic algorithm which has a linear master 
problem and does not attempt to descend the dual function. 

The cutting plane algorithm with Line Search (CPLS): In an effort to 
force the CP algorithm to descent the dual function, Hearn and Lawphongpanich 
(19S9b & 1990) added a line search step. CPLS can be obtained from the generic 
algorithm by setting Ck = oo for all k and, in Step 3, letting 

= arg min {£(u'' -f A(u'-'+‘ - i-*^))} 

0 ^ A ^ A p 

where X^p = max{A : -f A(u^'‘'"* — v^) G I ]. Thus, minimizes £(u) along 

the direction d^‘ = — v^. Hearn and Lawphongpanich (1989a) showed that, 

if £(u) is differentiable at , then d^ is a descent direction and £(t’^"'‘') < C(v^). 
Therefore, CPLS is a variant of the generic algorithm which has a linear master 
problem and attempts to descend the dual function, i.e., a descent is guaranteed 
whenever the dual function is differentiable at the current iterate, v^. 

The bundle methods; As in CPLS, the main thrust of the bundle methods, 
first introduced by Lemarechal (1974, 1975) and Wolfe (1975), is to generate a 
monotonic sequence of dual function values. From the generic algorithm, one can 
obtain a version of the bundle methods by setting Ck < cc for all k and, in Step 
3, letting 



V 



_ 



if £(u^+’) -f - L^{u^+^)) < C{v’^) 

otherwise 



S 



where m € (0,1). Other methods for determining exist and they can be 
found in, e.g., Auslender (1987), Fukushima (1984), Gaudioso and Monaco (1982), 
Kiwiel (1985 & 1989), Lemarechal (1989) and Mifflin (1977). Also, note that 
updating is in essence choosing the prox-center for the next iteration. 

Several authors (e.g., Fukushima, 1984, Kiwiel, 1989 and Lemarechal, 1991) 
have observed the similarity between proximal minimization and bundle methods. 
However, it is interesting that the developments of the two types of algorithms 
appear different. In an effort to unify the development of bundle methods and 
proximal minimization. Section 4 provides a convergence proof for a simple method 
for updating which is different from, but related to, the one shown above. 

When the iteration is called a ‘serious’ step. Otherwise 

(i.e., = t’^), it is called a ‘null’ step. So, after every serious step, the dual 

function decreases and bundle methods change the prox-center. Since Cf. < oo, the 
master problem for the bundle methods is quadratic (see problem MP or MD). 
Therefore, any bundle method can be considered as a quadratic counterpart of 
CPLS since it has a quadratic master problem and attempts to descend the dual 
function, in that it decreases the dual function at every serious step. To emphasize 
the fact that bundle methods are variants of the generic algorithm, we also refer 
to them as proximal minimization algorithms with subgradient bundles (PMSB). 

A Proximal minimization w'ith cutting planes (PMCP): Setting Ck < 
oo and always letting in Step 3 produces a variant of the generic 

algorithm which has a quadratic master problem and does not attempt to descend 
the dual function. Note the PMCP is similar to the bundle methods because 
both have a quadratic master problem; however, it is different because it changes 
the prox-center after every iteration instead of after a serious iteration. In the 
framework of the generic algorithm, PMCP is a quadratic counterpart of the 
cutting plane algorithm, for they both do not attempt to descend the dual function 
and one has a linear master problem and the other, quadratic. 

As mentioned earlier, the convergence of the CP algorithm does not require 
any monotonicity of the dual function values. On one hand, it is curious that 
an algorithm can converge without any attempt to decrease the dual function 
directly. On the other hand, the convergence of the CP algorithm confirms that 
decreases in the cutting plane approximating function sufficiently insures that the 
dual function eventually converges (not necessarily in a monotonic manner) to 
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the optimal value. The convergence proof for PMCP in the next section further 
corroborates this hypothesis. 

Table 1 below summarizes the relationships among the four algorithms which 
use cutting planes to approximate the objective function. Recall that the phrase 
‘‘attempt to descend the dual function' is to indicate that, although none of the 
four algorithms guarantees a decrease in the dual function at every iteration, 
some make an attempt to decrease the function in each one. In particular, the 
bundle methods only yield a decrease at every serious step and CPLS yields one 
whenever the dual function is differentiable. Nevertheless, all is proven to converge 
to a solution of D. 



Master Problem 


Attempt to Descend the 
\’es 


; Dual Function 
No 


Linear (q = oc) 
Quadratic (c^ < oc) 


CPLS 


CP 


Bundle methods or PMSB 


PMCP 



Table 1: Classes of algorithms which use cutting planes 
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3. Convergence of PMCP 



Below, we restate more concisely the generic algorithm as specialized to the prox- 
imal minimization algorithm with cutting planes. 

A PROXIMAL MINIMIZATION ALGORITHM 
WITH CUTTING PLANES (PMCP) 

Step 1; Select € U. Set k = \ and 

x’ = arg max{/(x) + ■ g{x)]. 

Step 2: Solve the master problem 

= argrnin{L'-'(u) + ^||u - 

If stop and is an optimal solution. 

Step 3: Solve the subproblem 

= argmax{/(x) + • ^(x)} 

x6A' 

Increment k by 1 and go to Step 1. 



First, note that since always equals to the variable v is not needed 
and has been eliminated from the above algorithm. Then, recall that in Step 2 
L^{u) is convex and defined previously as 

L'^iu) = rnax {/(x‘) -|- u ■ ^(x')}. 



In Step 3, satisfies 

The theorem below validates the stopping rule in Step 2. 
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Theorem 1. If = u*, then is an optimal solution to problem D. 



Proof: Consider the cutting plane representation of the master problem at the 
iteration. 



min w -h 
s.t. 



1 



u — u 



k\l2 



U.’ > /(^') + 

Au < 6 



i = 1 ,...,^' 



Then, where is an optimal solution. Since = 

it follows from (.3) that 



= L^{u^) = £(u^). 



In addition, the KKT conditions are necessary at and there must 

exist vector n and A satisfying the following equations: 



^5(x')7r, + 

»€/' j€J' 




= 0 
= 1 

> 0 V i G and j G J' 



where 

a-' = the row of matrix A, 

I' = {i : u’*'*’’ = f{x') + i/' • 5 (x’) for i = 1, . . . , A'} and 
J' = {j : a-' ■ = bj for j = 1, . . . , p}. 

Since = £(u^), p(x'), V i G I\ are subgradients of C{u^) and 

H{g{x') : i G /') C cl£(u'') 

where //(•) denotes a convex hull. Thus, the KKT conditions can be written more 
compactly cls 

0 G dL(u^) + a^Xj. 

j€J' 
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However, this is the KKT condition for problem D. Since £(u) is convex and U 
is a polyhedron, the condition is sufficient and the proof is complete. □ 

By the above theorem, if PMCP stops after a finite number of iterations, it 
must stop at an optimal solution. When PMCP generates an infinite sequence, 
it is sufficient to show that PMCP converges to an optimal solution for the case: 
Q = c > 0 V/:. (This is true because of the assumption that lim infjt_oo c/: > 0.) 
To do so, define the following: 

X°° = a:^, a:^, . . .}, i.e., the set of a:‘ generated by Steps 1 and 3 of PMCP. 

[.Y~] = the closure of Note that [.Y~] C AY 
L°^[u) = max^g[A'«]{/(2:) + u 

From the above description, it is clear that 

L\u) < L^{u) < C{u) for ;• = 1,2,... 

where the first inequality follows from the fact that {x' : i = 1,...,/:} C [X°°] 
and the second inequality from the fact that [.Y°®] C X. Observe also that for 
any k 

i‘+'(u‘) = £(u‘) = /(x‘) + u*-s(i‘) V;=0,l,2,... (4) 

Similarly, since x^ € [-Y°°], the following must hold 

= C{u^) Wk<oo. (5) 

Moreover, {Z-^(u)}y|. is a sequence of continuous convex function which converges 
pointwise to L°°(u). How'ever, since (L^(u)}k is also monotonic, it must also 
converge uniformly to L°°(u) (see. Theorem 7.13 in Rudin, 1976). 

To prove convergence and obtain a solution to D, define a sequence as 

follows: let = £(u^ ) and for k = 1,2,3,... let 

^k+i _ f >C(u'=+*) if £(u'=+^) + 

( otherwise 

wffiere m € (0,1). Also, we have from (4) that — Z,^‘''*(u^‘''^). So, com- 

puting z^ requires no extra effort. Next, construct an index set K, as follows 

X = {it: = £(u^+').} 
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In words, fC is the index set of iterations in which there is a sufficient decrease in 
the dual function, i.e., by an amount The next two results address 

the convergence of PMCP which AC is an infinite set. 

Lemma 2. Let AC be an infinite set. If a subsequence {u^jjtgA' converges to u°° 
for some K C AC, then converges to u°°. 

Proof: Consider the sequence By definition, it is a nonincreasing sequence 

which is bounded below by £(u*). Thus, { 2 ^}* must converge. Since K C AC, the 
following must hold for all k G K 

2c 

Taking the limit as L — > cc and k G K yields that 

lim — = 0. 

Since both m and c are positive and must have a common 

limit point, u°° . □ 

Theorem 3. //AC is an infinite set, then every limit point of the sequence 
is a solution to D. 

Proof: Let u“ be a solution to D and limtgA' for some K C AC. Since 

solves the master problem in Step 2, the following must hold 

\/uGUkk. (6) 

For any a G (0, 1), setting u = au’ + (1 — in (6) gives 

L*-'(u^'+’) + T||i,*+i _ u*.-||2 < L'=(qu- + (1 - q)u'-'+*) + 

Ti|qu* + (1 - 

lA=(u'--+') + L||u*+i _u'-||2 < oL"(u') + (l -o)L"(u'=+’) + 

:^||q(u‘ - u'^) + (1 - o)(u*‘*'’ - U*)||^ 

Jc 

_„/:||2 < aL^-(u*)+(l -q)L''(u'=+') + 

2c 
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Qi‘(u‘+') < Qi‘(u-)-i||u‘ + l-„‘|p + 

aL’‘{u'‘^^) < q £( u ”) — + 

i(||a(u--u‘)|| + ||(l-Q)(u‘+'-„-)||p (7) 

where the second inequality follows from convexity of L^{u), the third from tri- 
angular inequality and the last from the fact that L^iu") < £(u*). Since L^{u) is 
continuous for all j = 1,2,. . . and, from Lemma 2, + 0 for fc € K, 

there must exist, for any e > 0, a sufficiently large ki such that for any j 

- L^(u*)| < £, VkeK and k > ki, or, 

L^{u'‘) -e< < U{u^) -f £, V e K and k > ki. 

Setting j = k and using ( 3 ), i.e., L^{u^) = £(u^). yield the following 

£(u*) - £ < < Ciu’^) + £, Wke K and k > k^. 

Combine the left inequality with (7) to obtain that 

q(£(u*) -£) < 

a£(u‘)-il|u^+^-u^|p-^ 

^(||a(u- - u'^)|| -h 11(1 - q)(u'‘-+i - u*=)||)^ V e K and k > k. 

Take the limit as k oo and k 6 /v and obtain 

a(£(u°°)-£) < a£(u')-h^I|Q(u--u-)|l- 

2c 

£(u-)-£(u*) < - u->)f + e. (8) 

Ic 

Since (8) holds for any q G (0, 1) and £ can be chosen arbitrarily small, it must 
be true that 

£(u°°) -£(u‘) = 0, 

Thus. u°° is a solution to D.D 
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An immediate consequence of Theorem 3 is that the entire sequence 
converges to the optimal solution when D has a unique solution (see, e.g., Bazaraa 
and Shetty, 1979). 

Consider now' the case w'hen fC is finite. Define £ = ma.x{k : k € + 1. Then, 

= z^ V > £, and 

+ \/k>£ (9) 

2 c 

Lemma 4. Let K. be a. finite set and £ he as defined above. Then, 

liminf — ti^lP = 0. 

k>e " " 

Proof: Assume otherw'ise, i.e., there exists a (5 > 0 such that 

liminf II IP > <^. (10) 

In other words, for a sufficiently large ki > C. 

||u'-'+^ -i/'||- > -<5. \fk>ki (11) 

4 

From Theorem 3, setting u = in (6) produces the following 

l'=(r/+') + T||u^-+i - iF-'lp < L^{u^) \fk. (12) 

Since {L^'{u)}k converges uniformly to I'^(ii). there must exist for every £ € (0, |) 
a sufficiently large k 2 such that 

\L^u) - L~(u)| < — \/k> k. and u e U, or 

4c 

- T- < + — yk>k '2 and u G U. (13) 

4c 4c 

Combining (12) and (13) yields 

£c«(u*+i) - £ + T||u''+i -u^'lP < L^{u^) + -^ yk>k2 

L^(u^+i) + l{||u<--+i _u*||2_.} < yk>k2 

2c 
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( 14 ) 



Using (5), we must have that 

r(u*=+') + - u^ll' - £} < C{u’‘) Wk > k^ 

Zc 

However, (11) and (14) imply that the subsequence {jC(u*)}^.>j, where k = 
max(/;i, ^ 2)5 is a monotonically decreasing sequence and bounded below by £(u"). 
Therefore, {£(u*)}ji.>j must converge and 

lim - e} = lim(£(u*'^^) — £(u*')) = 0 

k>k 2c k>k 

lirn||u^+' - = c. 

k>k 

Since e can be chosen arbitrarily small, this contradicts (10). □ 

The above lemma implies that there exists a A' Q: {k : k > C} such that 

lim.l|u^-+' -u^lP = 0. 

k£l\ 

Since U is compact and 6 U for all k, there must also exists a K' C K such 

that converges to, say. As a consequence, must converge 

to as well. 



Theorem 5. If )C is finite, then is a solution to D. 



Proof: Based on the preceding discussion, there must exist a K C {k : k > (} 
such that the following conditions hold 

1. lim,gA'||u^+^-u^l|2 = 0. 

2. {u'‘]keK ^ 

3. 

From Theorem 3, setting u = au' + (1 — in (6) for any q € (0, 1) gives 

A^(u*=+>) + - u'^||2 < I*(qu- + (1 -q)u'=+') + 

^||au* + (1 — q)u*'‘*'^ — u^lp. 
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Using the same argument as in Theorem 3 with the index set A', it can be shown 
that 

£(u~) = £(u-). (15) 

Similarly, setting u = au^ + (1 — in (6) for any ct 6 (0,1) gives 

A*=(u*'+') + -u^lP < A'--(qu^ + (1 -cv)u'^+i) + 

^||qu^ + (1 - o:)n''+^ — u*"|p, 

and by the same reasoning it must follow that 

£(u°^)-£(u') <0or £(u“) < £(u0- (16) 

However, from (9) it is true that 

£(u^+') + yk>c 

2c 

Take the limit as A: — > oo and k G I\ and invoke the continuity of C{u) to obtain 
that 

£(u~) > --^ = £(t/) (17) 

Combining (15), (IG) and (17) yields 

£(u~) = £(u-) = £iu‘)- 
So, must be a solution to /).□ 

In addition to the above convergence results, if f{x) and g{x) are linear func- 
tions and X is a bounded polyhedral, then in Step 3 can be restricted to 
extreme points of X, for which there are finitely many. In which case, there must 
exist a sufficiently large i such that L^{u) = L^u) k > i and u £ U. How- 
ever, this implies that after £ iterations PMCP reduces to the application of the 
proximal minimization algorithm to the following linear program: 

min w 
s.t. 

w > f{x')-\-u-g{x') i = !,...,£ 

Au < b 

Then, it follows from Exercise 4.3 in Bertsekas and Tsitsiklis (19S9) that PMCP 
terminates finitely when D is the dual of a linear program, or equivalently, £(u) 
is piecewise linear. 



IS 



4. A Bundle Method 



Below, we describe a particular variant of the bundle methods which uses a dif- 
ferent scheme for updating in Step 4 of the generic algorithm. For later 
reference, we call this variant a proximal minimization algorithm with subgradient 
bundles (PMSB). One intention of this section is to present a convergent proof for 
PMSB which uses analysis similar to that of PMCP, thereby making the relation- 
ship between bundle methods and proximal minimization more concrete. Also, it 
should be noted that some variants of the bundle methods require a line search 
step (see, e.g., Fukushima, 1984, Gaudioso and Monaco, 1982 and Kiwiel, 1985). 
However, PMSB as stated below does not require any line search. 

A PROXIMAL MIXIMIZ.ATION ALGORITHM 
WITH SUBGRADIEXT BUNDLES (PMSB) 



Step 1: Select € U and m such that 0 < m < 1. Set k = l,td = and 

= arg max{/(x) -|- 1/' • ^(x)}. 

x6.V 



Step 2: Solve the master problem 



= arg min{L^’(i/) + 



1 



A:||2l 



iCL' • 2Ck ■ 

if = lA, stop and is an optimal solution. 
Step 3: Solve the subproblem 



= arg max{/(x) -f • 5^(x)} 
xt-V 

Note that £(u^+i) = /(x^+i) -f • g{x^+^). 

Step 4: Set 

^ I u‘+> if £(«*+■) + - o‘|P < £(o‘) 

1 otherwise 



(18) 
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Recall that, when = u*"*"*, iteration k is called a ‘serious’ step or iteration. 
Otherwise, = v^), it is called a ‘null’ step. In addition, the updating formula 

for in Step 4 is also related to the one in Section 2 (see also Lemarechal, 1991) 
which is 



V 



<:+! _ 



if < £(u*) 

u* otherwise 



(19) 



To obtain the relationship, observe that since is a solution to the master 
problem 



m 



2cfc 






£(u^+‘) + m2^||u'‘+' -u'-'ll* < £(u*’+') + m(l'‘(t''‘)-£^(u'’+')) 

So, the updating formula .(19) implies (IS). 



When PMSB terminates finitel}’. Theorem 1 in the previous section still guar- 
antees that is an optimal solution to D. Below are convergence results for 
the case when the algorithm generates an infinite sequence. As in Section 3, it is 
assumed without loss of generality that c^. = c > 0, V k, and let 

a: = (A- : 

So, AC is the inde.x set for the serious steps (iterations). 



Lemma 6. Let AC be an infinite set. If a subsequence converges to 

where K C AC, then {v^'^^}keK converges to v^. 

Proof: Note that the sequence {£(i'^')};. is a nonincreasing sequence which is 
bounded below by £(u'). So, {C{v^)}k must converge. Since K C AC, the following 
must hold 

V A e A' 

2c 

Following the same argument in Lemma 2, it can be shown that 

0 = 11m - u'-'lp. 

kei< 2c 

Since both c and m are positive, must converge to u°°.D 
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Theorem 7. // the cardinality of K. is infinite, then every limit point of the 
sequence ^ solution to D. 

Proof: Let be a convergent subsequence where K C AC. Since = 

V A: € /\ and is optimal to the master problem, the following must hold 

< L^^iu) + ^Ju- WueUtkeK. 

Using the same analysis as in Theorem 3 and the result for Lemma 6, it can be 
shown that {v^]kcK converges to u‘.D 

When the cardinality of AC is finite, define as before £ = max{A' : k 6 AC} + 1. 
So, every iteration k > £ must be a null step and the master problem in Step 2 
must have the form: 

= arg max{Z.^(u) + T||y _ V A’ > • . 

Next, let 

F^(u) = + T||u _ y^||2^ and 



Then, F^{u) epi-convergts to F°^{u) since L*^{u) pointwise and monotonically 
converges to L°°{u). (For the definition and properties of epi-convergence, see, 
e.g., the appendix in Wets, 1989.) In addition, it follows from Theorem A. 2 of 
Wets(1989) that if for some K C {1,2,3,...}, then 

= arg min F°°(u). 

u£U 



Furthermore, since L^{-) also uniformly converges to L^(-), there must exists, 
for any £ > 0, a sufficiently large ki such that 

IL'^(u) - L^(u)| < £ V u G t/ and A > Aj, 

and by setting u = 





< 


c 


V A > A'l 






< 


c 


V A > Ai 






< 




u^+’) + £ VA>Ai 


(20) 



where the middle inequality follows from (5). 
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Theorem 8. If the cardinality of AC is finite, then is a solution to D. 

Proof: S ince U is a compact set, there must exist a set K C {k : k > £} such 
that conv'erges to u°°. 

If u°° = v^, then the above observation concerning the epi-convergence of 
F'^(u) implies that 

C(v^) = L-{v^) 

= L~(u°°) 

= min(Z,“(„) + i||u-»'iP) 

ti ^ L 

< min{£(u) + 

< C(v^) 

where the first equation follows from (5), the third from the observ'ation that 
= arg minuet; F®®(u), the fourth from the fact that L°°{u) < C{u) 'i u £ U, 
and the last from the fact that is an element of U. Thus, 

£(/) = mjn{£(u) + ^l|u - 
Howev'er, this implies that solves D. 

Assume that u°° ^ v^. Let 6 = ||u°° — Then, there must exists a 

sufficiently large k 2 such that 

- r^ir > ^ V L- > A-2 t k e K (21) 

However, since solves the master problem in Step 2 with as its prox-center, 
it must be true that 

- I’^IP < L^v^) = C{v^), yk>i, 

where the equality follows from (4) in Section 3. For any e > 0, let A'j be as in 
(20) so that 

£(u*^'''^) — £ + T _ i ,^||2 < £(i/), k > max(£ Lq). 



oo 



Set e = and obtain 

- (1 - ^)^) < 'i k> max(£, k^). 

Then, for any k > ma.x{C,ki,k 2 ) and k € A', (21) implies that 

£(«') > £(„*+■) +l(||u‘«-„'f-(l-m)j) 

> £(u‘+>) + l(||t,‘+‘-u'f-(l-m)||u‘+‘-«'|p) 

However, this implies that there must be a serious step after iteration i which is 
a contradiction. Thus, every convergent subsequence of {u^^^}k>e converges to 
which is a solution to D. However, this implies that the sequence 
converges to as well. □ 
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5. Conclusion 



This paper presents a generic algorithm in the framework of proximal minimiza- 
tion. It is shown that this generic algorithm can be specialized to four different 
algorithms; they are the cutting plane algorithm, the cutting plane algorithm 
with line searches, the bundle methods (or proximal minimization with subgra- 
dient bundles, PMSB) and proximal minimization with cutting planes (PMCP). 
The first three can be found in the current literature; however, the last one is new. 

Besides the obvious relationship that all four algorithms can be derived from 
the generic algorithm, other relationships based on the master problem and con- 
vergence behavior are also established. Convergence proofs for PMSB and PMCP 
are also given. 



References 



[1] Auslender, A., (1987), ‘Numerical Methods for Nondifferentiable Convex Op- 
timization,’ Mathematical Programming Study^ 30, 102-126. 

[2] Bazaraa, M. S. and Shetty, C. M., (1979), Nonlinear Programming: Theory 
and Algorithms, Wiley, New York, New York. 

[3] Bertsekas, D. P., (1982), ‘Projected Newton Methods for Optimization Prob- 
lems with Simple Constraints,’ SIAM Journal of Control and Optimization, 
20, 221-246. 

[4] Bertsekas, D. P.. and Tsitsiklis, J. N.. (1989), Parallel and Distributed Com- 
putation, Prentice Hall, Englewood Cliffs, New Jersey. 

[5] Cheney, E. W., and Goldstein, A. A., (1959), ‘Newton’s Method for Convex 
Programming and Chebyshev Approximation,’ Numerische Mathematik, 1, 
253-268. 

[6] Dantzig, G. B., (1963), Linear Programming and Extensions, Princeton Uni- 
versity Press, Princeton, New Jersey. 

[7] Dantzig, D. B., and Wolfe, P., (1960). ‘Decomposition Principles for Linear 
Programs,’ Operations Research, 8, 101-111. 

[8] Fukushima, M., (1984), ‘A Descent Algorithm for Nonsmooth Convex Pro- 
gramming,’ Mathematical Programming, 30, 163-175. 

[9] Gaudioso, M. and Monaco, M. F., (1982), ‘A Bundle Type Approach to the 
Unconstrained Minimization of Convex Nonsmooth Functions,’ Mathematical 
Programming, 23, 216-226. 

[10] Hearn, D. W., Lawphongpanich, S., and Nguyen, S., (1984), ‘Convex Pro- 
granuning Formulations of the Asymmetric Traffic Assignment Problem,’ 
Transportation Research, ISb, 357-365. 

[11] Hearn, D. W. and Lawphongpanich, S., (1989a), ‘Lagrangian Dual Ascent by 
Generalized Linear Programming,’ Operations Research Letters, 8, 189-196. 

[12] Hearn, D. W. and Lawphongpanich, S., (1989b), ‘Generalized Linear Pro- 
gramming with Line Search,’ Proceedings of the 28th IEEE Conference on 
Decision and Control. 



25 



[13] Hearn, D. W. and Lawphongpanich, S., (1990), ‘A Dual Ascent Algorithm 
for Traffic Assignment Problems,’ Transportation Research^ 24, 423-430. 

[14] Kelly, J. E., (1960), ‘The Cutting Plane Method for Solving Convex Pro- 
grams,’ Journal of SIAM, VIII, 703-712. 

[15] Kiwiel, K. C., (1985), Methods of Descent for Nondifferentiahle Optimization, 
Lecture Notes in Mathematics 1133, Springer, Berlin, Germany. 

[16] Kiwiel, K. C., (1989), ‘Proximity Control in Bundle Methods for Convex 
Nondifferentiahle Minimization,’ Mathematical Programming, 46, 105-122. 

[17] Kiwiel, K. C., (1991). ‘A Dual Method for Solving Certain Positive Semidef- 
inite Quadratic Programming Problems,’ SIAM Journal on Scientific and 
Statistic Computing, to appear. 

[18] Lemarechal, C., (1974), ‘An Algorithm for Minimizing Convex Functions,’ 
Proceedings of IFIP'74 Congress, J. L. Rosenfeld (Ed.), North Holland, .Am- 
sterdam, The Netherlands, 552-556. 

[19] Lemarechal, C., (1975), ‘An Extension of Davidon Methods to Nondifferen- 
tiable Problems,’ Mathematical Programming Study, 3, 95-109. 

[20] Lemarechal, C., (1989), ‘Nondifferentiahle Optimization,’ in Optimization, G. 
L. Nemhauser, A.H.G. Rinnooy Kan and M. J. Todd (Eds.), North Holland, 
New' York, New York, 529-572. 

[21] Lemarechal, C., (1991), ‘Lagrangian Decomposition and Nonsmooth Opti- 
mization: Bundle Algorithm, Prox Iteration, .Augmented Lagrangian,’ Pro- 
ceedings to the 1991 Conference on Xonsmooth Optimization: Methods and 
Applications at ERICE(Trapani), Sicily, Giannessi, F. (Ed.). 

[22] Lin, Y. Y., and Pang, J.-S., (1987), ‘Iterative Methods for Large Convex 
Quadratic Programs; A Survey,’ SIAM Journal of Control and Optimization, 
25, 383-411. 

[23] Magnanti, T. L., Shapiro, J. F., and Wagner, M. II., (1976), ‘Generalized 
Linear Programming Solves the Dual,’ Management Science, 22, 1195-1203. 

[24] Martinet, B., (1970), ‘Regularisation d'Inequations Variationnelles par Ap- 
proximation Successives,’ Rev. Erancaise Inf. Rech. Oper., 4, 154-159. 



26 



[25] Mifflin, R., (1977), ‘An Algorithm for Constrained Optimization with Semis- 
mooth Functions,’ Mathematics of Operations Research^ 2, 191-207. 

[26] Pang, J.-S., (1983), ‘Methods for Quadratic Programming: A Survey,’ Com- 
puters and Chemical Engineering, 7, 583-594. 

[27] Rockafellar, R. T., (1976), ‘Monotone Operators and the Proximal Point 
Algorithm,’ SIAM Journal of Control and Optimization, 14, 877-898. 

[28] Rudin, W., (1976), Principles of Mathematical Analysis, McGraw Hill, New 
York, New York. 

[29] Wets, R. J.-B., (1989), ‘Stochastic Programming,’ in Optimization, G. L. 
Nemhauser, A.H.G. Rinnooy Kan and M. J. Todd (Eds.), North Holland, 
New York, New York, 573-629. 

[30] Wolfe, P., (1975), ‘A Method of Conjugate Subgradient for Minimizing Non- 
differentiable Functions,’ Mathematical Programming Study, 3, 145-173. 

[31] Zangwill, W. I., (1969), Nonlinear Programming, Prentice Hall, Englewood 
Cliffs, New Jersey. 



INITIAL DISTRIBUTION LIST 



1. Library (Code 52) 2 

Naval Postgraduate School 

Monterey, CA 93943-5000 

2. Defense Technical Information Center 2 

Cameron Station 

Alexandria, VA 22314 

3. Office of Research Administration (Code 81) 1 

Naval Postgraduate School 

Monterey, CA 93943-5000 

4. Prof. Peter Purdue 1 

Code OR-Pd 

Naval Postgraduate School 
Monterey, CA 93943-5000 

5. Prof. Siriphong Lawphongpanich 20 

Code OR-Lp 

Naval Postgraduate School 
Monterey, CA 93943-5000 



41 





I 



DUDLEY KNOy i irdadv 




3 2768 00332751 1 



